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1.  INTRODUCTION 


In  this  report,  we  examine  the  regression  problem 
considered  by  Li  and  Hwang  (1984).  A  number  of  important 
Navy  problems  may  be  cast  in  the  form  of  regression 
problems.  Villalobos  and  Wahba  (1987),  for  instance,  note 
that  this  is  the  case  with  the  task  of  es^’mating  posterior 
probabilities  in  class i f ica t i on  problems.  Whereas  Li  and 
Hwang  (1984)  consider  the  errors  in  their  regression  problem 
to  be  normally  distributed,  however,  we  will  allow  for  a 
more  general  class  of  error  distributions  to  accommodate 
problems  in  which  this  normality  assumption  is  not 
satisfied . 

Suppose  that  observations  y  ,  y  ,  .  .  .,  y  are  made  at 

1  2  n 

levels  x  ,  x  ,  .  .  .,  x  with 

12  n 


y  =  s  ( x  )  +  e 
j  j  j 


(1.1) 


where  the  function  s  is  unknown  and  the  are  independent 
random  errors  having  mean  zero.  Using  vector  notation,  we 
may  write  (1.1)  as 


y  =  m  +  e 


(1.1 


where  y  =  ( y, ,  •  •  • ,  y  ) 1 ,  /j  =  (u  , .  .  .  ,/j  )  l  =  (  s  (x  ) ,  .  .  ., 

1  r*  4  r>  1 


s(xn) )  ,  and  .  .  .,*n>  . 


Note  that  the  observed 


vector  y  is  a  simple  estimate  of  the  unknown  vector  m • 


term 


Li  and  Hwang  (  1984  )  consider  estimates  \j  of  p  of  the 


(U  =  ( 1-c  )y  +  cM  y 

r» 

=  y  -  c( I-M  ) y 


(1 
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See 

section 

the  choice 

of 

M  . 

(Errata  for  Li  and  Hwang  (1984)  are  given  in  Appendix  A.) 


m  =  ( 1-c )y  +  cM  y 


Figure  1.1.  Geometry  of  the  Li  and  Hwang  (1984)  estimate. 


2.  PEARSON  RANDOM  VARIABLES 


As  mentioned  in  the  introduction,  Li  and  Hwang 
(1984)  relied  on  a  result  of  Stein  (1981)  to  establish 
choices  of  c  in  (1.2)  so  that  m  dominates  y  as  an  estimator 
of  ju  when  the  errors  are  independent,  identically 
distributed  normals.  Stein  (1981)  established  his  result 
with  the  aid  of  an  identity  which  is  satisfied  for  normal 
random  variables.  Specif ically,  if  X  is  a  normal  random 
variable  with  mean  &  and  variance  a2 ,  then  for  any  suitable 
function  h 

E  (X-e)h(X)  =  a2  E  h' (X)  (2.1) 

where  E  denotes  the  expectation  operator.  Since  an  Identity 
of  this  sort  holds  for  random  variables  having  distributions 
in  the  Pearson  (1895)  class  (see  Hudson  (1978),  Johnson 
(1984),  or  Haff  and  Johnson  (1986a)),  which  Includes  the 
normal,  we  suppose  that  the  errors  in  our  regression  problem 
have  Pearson  distributions. 

Specifically,  we  assume  that  the  errors  are 
independent,  with  e  ^  having  probability  density  function 
f  (w),  where 

0  -  W 

J 

f’(w)  =  -  f  (w). 

3  ft  +  ft  V  *  ft  v  2  3 

jo  j i  JZ 

We  say  that  e  has  a  Pearson  density  with  parameters  9, 
ft  I  ft  t  ft  t  respectively.  For  future  reference,  let 


m 


a  (w) 

j 


b  ( v ) 
j 


ft  +  ft  w  +  w 

JO  jl  j  2 


1  -  2ft 


w 

J  — 


a  (t) 

j 


(2.2) 


(2.3) 


Note  that  the  b  are  only  specified  to  within  some  arbitrary 


constant  of  integration 


Estimates  of  p,  yet  to  be 


presented,  will  involve  the  functions  a  and  b  . 

J  J 

Examples  of  random  variables  having  Pearson  densities 


are  listed  in  Table  2.1.  For  these  densities,  the  Pearson 
parameter lzation  and  the  functions  a(  )  and  b( ■ )  are  listed 
in  Table  2.2. 

We  now  state  an  extension  of  (2.1)  to  the  Pearson 
family. 


Theorem  2.1:  Let  X  be  a  Pearson  random  variable  with 
density  f  on  the  interval  (r,s)  and  a(-)  defined  by  (2.2). 
If  h(  )  is  a  differentiable  function  such  that 


11m  a(x)h(x)f(x) 


11m  a(x)h(x)f(x)  =  0, 


x  -*  s 


then 


E  ( X-u / h ( X )  ~  E  a ( X ) h ' (X) 


(2.4) 


(2.5) 


where  v-(9+ft  )/( l-lft^)  provided  these  expectations  exist 


Proof:  See  Hudson  (1978),  Johnson  (1984),  or  Haff  and 
Johnson  (1986a)  for  a  proof  using  integration  by  parts.  ■ 
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Table  2.1.  Examples  of  Pearson  densities. 


Name  Density 

Notation  Mean  Variance 


1 

2  2 

-  exp  (-(X-0)  /2a  ],  -00  <  x  <  00 

/ a  2 . i/2 

(  2rrcx  ) 

E  X  =  9  Var  X  =  o'2 


Normal 
me,  a2) 


Beta 
B(a,  ft) 


ria+ft) 

r(a)r(ft) 


a- 1 
X 


(1-X) 


ft-1 

/ 


0  <  x  <  1  ( a,  ft  >  0 ) 


a 

E  X  =  - 

(«+/?) 


aft 

Var  X  =  - 

(o«+/?)2  («+^9+l) 


Gamma 

He*,/?) 


_ct  a- 1  .  .  . 

^  x  exp(-^x)  ^  x  ,  o 
T(a) 


( ft  >  0) 


E  X  =  a/ft 


Var  X  =  a/ft z 


Reciprocal 


ft  exp{-ft/x) 


a+1  . 

x  r  ( o< ) 


x  >  0  ( a,  ft  >  0  ) 


E  X 


a  > 


Var  X  = 


a  > 


(a- 1) 


( a-1 )  (a- 2) 


T 

t  (a,6,aZ) 


r( (a+i)/2) 

( ana 2 ) 1 XZ  T ( a/ 2 ) 

E  X  =  9,  a  >  1 


1  + 


(x-0)2 


2 

acr 


- ( a+1 )/2 

,  -00  <  X  <  00 

( o»  >  0  ) 


Var  X 


z 

Ota 


(«-2) 


ot  >  2 


Table  2.1.  Continued. 


Name 

Notation 


Mean 


Density 


Variance 


F 

F  (a,  (9) 


H  (0.473)/ 2  )  oia/2  ftP/2 
r  ( a/  2  )  V{ ft/ 2) 


^/2-i 


{ft+ox) 


-(«+/?)/ 2 

/ 


X 


o 

0) 


ft 

E  X  =  -  ,  /?  >  2 

{(3-2) 


2ft2 ( ct+ft-2 ) 

Var  X  =  -  ,  /?  >  4 

«(^-2)?(/>-4) 


Power 


^  u-0  0_1 

e  k  x 


0  <  x  <  k 


(0  >  0) 


k0 


kz0 


E  X  = 


(0+1) 


Var  x  = 


(0+2)  (0+l)‘ 


Pareto 


0  k  x 


0  -(0+1) 


x  >  k  >  0 


(0  >  0) 


E  X 


k0 


(0-1) 


k20 


,  0  >  1  Var  X  = 


( 0-1 )  ( 0-2 ) 


0  >  2 


Pearson 

Type 

IV 


oc  exp 


Q'  (0) 


ft  k 

2 


arctan 


'  Q* (x)  ' 
.  k  . 


where  Q  (  x  )  =fto+ft^x+ftzx2 

and  kz-4ft  ft  -ftZ  >  0 

0  2  1 


Q(x) 


-(1/20  ) 
2 


-00  <  X  <  00 


m  =  E  X  =  -  ,  ft  <  1/2 


Q(m) 

Var  X  =  -  ,  ft  <  1/3 


Table  2.2.  Some  Pearson 

par a me ter i za  t i ons , 

Density 

a  ( w ) 

b  ( w ) 

N  ( 0 ,  o'2 ) 

(0,a2,O,O) 

2 

a 

w/ O'2 

B (a,ft) 


(a+f?-2)  (a+/?-2)  (a+/?-2) 

w( 1-w) 
(a+ft) 


(<*+ft)  In 


w 


(1-w) 


r  ( ot,  /?)  ((a-l  )/ft,Qfl/ft,Q)  w /ft  ft  In  w 


ir(ot,/?)  ( ft/  (  a+1 ) , 0 , 0 , 1/ ( a+1 )  )  w2/(a-l)  -(a-l)/v 


aa2+02  -20  1 

t(a,0,a2)  (0, - , - , -  ) 

(a+1)  (a+1)  (a+1) 


2  /  ^  .  2 
aa  +  (v-0) 


(a-l) 


(a-l) 

/  2.1/2 
( ao-  ) 


arctan 


(V-0) 


/  2.1/2 
( a  o'  ) 


F  (ct,  ft) 


Power 


ft(a-2)  2ft  2 

( - ,0, - , - ) 

a(^?+2)  a(/?+2 )  (/?+2) 


2w 

-  (v+ft/a) 

(ft~2) 


-1 

(0,0,0, - ) 

(0-1) 


a(ft-2 ) 

-  In  ( 1  +  2 / ( aw ) ) 

4 


z 

-w 

-  not  useful 

(0+1) 


(contd ) 


Table  2.2.  Continued. 


Density 


(e^o'/V^V 


a  ( w ) 


b(w) 


Pareto 


(0,0,0, - ) 

(Cr  1) 


(0-1) 


not  useful 


Pearson 

Type 

IV 


(e,^o''V'V 


(I"2 ^z) 


with 


2  ( l-2/?z  ) 


k2=4^  (5  >  0 

0  2  1 


arctan 


Q’  (w) 


Note  that  Theorem  2.1  reduces  correctly  In  the  event  X 
is  a  normal  random  variable.  For,  from  Table  2.2,  v= 
(0+/?i)/(l-2(?js)=0  and  a(x)=x.  Substituting  into  (2.5)  we 
obtain  (2.1)  provided  h  satisfies  the  conditions  of  Theorem 
2.1. 

An  understanding  of  v  and  a(-)  in  Theorem  2.1  is  given 
by  the  following  result: 

Corollary  2.1:  If  Theorem  2.1  is  satisfied  with  h(x)=l  and 
h(x)=x,  then 

E  X  =  o  =  (6+ft  )/(l-2ft  )  and 

1  2 

Var  X  =  E  a ( X )  =  (ft  +ft  -u+ft  vz)/(l-3ft  ). 

0  12  2 

Hence,  in  this  case,  v  is  the  mean  of  X  and  a(X)  is  an 
unbiased  estimate  of  the  variance  of  X. 


Proof:  Set  h(X)=l  and  (X-u),  respectively. 


Higher  order  moments  of  Pearson  variables  may  be  found  with 
the  aid  of  a  recurrence  formula  derived  from  (2.5)  by 
setting  h(x)=(x-v)n.  The  first  four  moments,  for  example, 
may  be  used  to  determine  bounds  on  tail  probabilities.  See 
Appendix  C  for  details. 

Throughout  this  report,  ve  use  the  fact  that  if  e  is  a 

Pearson  random  variable  having  mean  zero,  then  y  -u  *s 

j  }  j 

is  a  Pearson  random  variable  having  mean  ij^.  This  is  a 
consequence  of  the  following  result: 


Theorem  2.2:  If  U  is  a  Pearson  random  variable  with 
parameters  m,r,s,t,  then  V=eU+£  is  a  Pearson  random 
variable  with  parameters  em+f,  e2r-efs+f2t,  es-2ft,  t. 
Furthermore 


ay(  v )  =  e  ay(u)  and 
by( v )  =  bytul/e. 


Proof:  See  Kaskey,  Krishnaiah,  Kolman,  and  Steinberg  (1980) 
tor  the  proof  that  V  is  a  Pearson  random  variable  with  the 
given  parameters.  The  expressions  for  ay(v)  and  by(v) 
follow  by  direct  calculation  from  (2.2)  and  (2.3).  ■ 


Our  problem  is  to  choose  c  so  that  M=y-c(I-M  )y  (recall 

n 

(1.2))  is  a  good  estimate  of  fj  when  y  is  a  vector  of 
independent  Pearson  random  variables  having  mean  fj.  In  this 
sect '  n  we  approach  this  problem  from  a  Bayesian 
perspective.  In  particular,  we  suppose  that  ^  has  a  prior 
distribution  n(^) .  As  the  optimal  c  is  best  understood  in 
terms  of  the  Bayes  estimate,  call  it  6  ,  of  /j,  we  begin  by 

3 

stating  6  . 

Theorem  3.1:  Suppose  that  Y  is  a  nxl  vector  of  independent 
Pearson  random  variables  for  which  a  (y  ),.  .  .,a  (y  ), 

11  n  n 

given  by  (2.2),  are  completely  specified.  Then  the  Bayes 
estimate  of  v  with  respect  to  squared  error  loss,  provided 
it  exists,  is  given  componentwise  by 

R  d  In  f ( y ) 

<5  (y)  =  y  +  a'(y)  +  a  (y  )  -  (3.1) 

V  V  V  V  , 

dy^ 

where 

f(y)  =  [  n  fv(yjR)  d  ri(v)  (3.2) 

J  1=1  1 

is  the  marginal  density  of  y. 

Proof:  See  Johnson  (1984),  p.  31,  or  Haff  and  Johnson 


(1986a),  p.  46.  ■ 


Example  3.1:  Suppose  that  Y  ,  1  =  1,.  .  .,n,  are  Independent 

N (p^,aZ)  random  variables  with  o'2  known.  Then,  from  Table 
2 

2.2,  ajy  )=<?  .  Assume  that  l=x,  .  .  .,n,  are  independent 
N(^  ,x2)  random  variables  with  the  y  and  r2  known.  A  stan- 
dard  calculation  (see,  for  example,  Berger  (1985),  pp. 
127-128)  shows  that  f(y),  the  marginal  distribution  of  y,  is 
the  multivariate  normal  density  with  mean  r  =  (r  ,  •  •  .  ,r  )l 

and  covariance  matrix  (oZ+ r2 ) l .  Substituting  a  (y  )  and  f(y) 

t  V 

into  (3.1)  we  obta i n 


=  ( 1-r  )y^  +  ry^ 

where  z-az  /  ( az\zz )  .  Note  that  0  <  r  <  1,  so  that  the  Bayes 
estimate  of  m  lies  between  y  and  y  .  Also  note  in  this 

t  V  *  X. 

Q 

example  that  6 ^  depends  on  y  only  through  y  .  In  general, 

Q 

6^  may  depend  upon  all  of  the  components  of  y. 


Example  3.2:  Suppose  that  Y  ,  1=1,.  .  .,n,  are  Independent 
I r ( a  ,/?  )  random  variables  with  the  a  known.  So,  from 

V  V  V 

Table  2.2,  a  ( y  ) =y2/ ( a -1 )  .  Also  assume  the  improper  prior 

n  r 

run)  =  n  1 

i  =  i 

for  /?.  Some  calculation  (c.f.  Example  3.4  of  Haff  and 


Johnson  (1986a))  reveals 


,r 


f  (y) 


r(a  +r  ) 

L  V 


r(«  ) 


so  that  the  formal  Bayes  estimate  is 


<5B  ( y ) 


(a  +r  +1 ) 

t  V 


(a. -1) 


In  Theorem  3.2,  which  follows,  we  present  the  Bayes 
estimate  of  among  the  class  of  estimates  /^y-c(  I-M^)y 
with  respect  to  squared  error  loss.  Such  a  Bayes  estimate 
may  be  referred  to  as  a  "restricted  Bayes"  estimate  for  we 
restrict  ourselves  to  looking  at  estimates  of  a  given  form. 

t) 

In  contrast,  6  given  by  (3.1),  may  be  thought  of  as  the 
"unrestricted  Bayes"  estimate  of  ij. 

Before  stating  Theorem  3.2,  we  provide  a  heuristic 
derivation  of  the  restricted  Bayes  estimate  of  the  form 
(1.2).  Consider  Figure  3.1.  Pictured  are  the  estimates  y. 


M  y,  and  <5  (given  by  (3.1))  of  v. 


If  we  are  going  to 


restrict  our  attention  to  those  estimates  of  /j  which  lie 

along  the  line  t  through  y  and  M^y,  then  our  Bayesian 

perspective  leads  us  to  say  the  best  estimate  of  p  will  be 

B  b  A 

that  point  on  (  nearest  6  .  So  we  desire  6  -fu  to  be 
orthogonal  to  t.  This  orthogonality  implies 


(6B  -  i)l(y  -  My)  =  0 


’-V-V-V 


•  .  •*  .  "  ,  '  .  *  .  »  »  -  ,*»»»»'_  •  «  *  _  “  I  •  „  • 
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y  / 


Figure  3.1.  Geometry  of  the  restricted  Bayes  estimate. 


& 
r  4 


which,  writing  out  ^  and  simplifying,  gives 

( y-<5B ) 1  ( I  -M  )y 

n 

c  =  c*  =  - ; -  (3.3) 

II  ( I  -M  )y« 

n  ■* 

Z  2  1 

where  II  ■  II  denotes  the  Euclidean  norm,  Mvll  =  w  w. 

To  summarize,  the  choice  c  -  c*  yields  an  estimate  p  of 
along  the  line  connecting  y  and  M  y,  which  minimizes  the 

n 

A  g 

Euclidean  distance  of  /u  from  the  Bayes  estimate  6  . 

**  * 

Sufficient  conditions  under  which  m  with  c  =  c  is  the  Bayes 

A 

estimator  with  respect  to  the  class  of  estimators  ij  are  now 
given  In  Theorem  3.2: 


16 


,  s  i\*>*%*  v  •/ 


J  '  *  '  ^  ^  »  H  >  ..  ..  ..  ."Y  .tYY .  *  J. 


v-yv-' 

.  JV  M  A  .  .  _  J> 


•  V-  v-v- 

'■*  %’  *  ,*  *,*  »  ’  V*« 

.  ."L  •->  • 


Theorem  3.2:  Suppose  Y  is  an  nxl  vector  of  independent 


Pearson  random  variables  having  finite  second  moments.  Let 
\(y )  =  (X  (y),.  .  .,X  (y))1  =  c*  ( I  -M  )y,  where  c*  is  given 

1  n  n 

by  (3.3).  If  (2.4)  holds  componentwise  for  h(y)=X(y),  and 
(3.4)  holds,  then  the  Bayes  estimator  of  the  form 

IU  =  y  -  c  ( I  -M  )y 

is  given  with  c=c*  in  (3.3). 

Proof:  As  the  conditions  of  Theorem  2.5  of  Haff  and  Johnson 

(1986a)  are  satisfied,  we  may  apply  this  result  to  obtain 


R{p,p) 

=  R(Y,m)  +  E  [-2a  vx  +  x  x 

) 

where  a  =  a  (y) 

n 

=  ( a  ( y  ) , .  .  . ,  a  ( y  ) ) 1 

i  ^  r  pi  n 

and 

vx 

.  .  ,<?X 

/ dy  )  .  Therefore 
p  p 

r(^)  =  r  ( Y )  + 

f  |  (-2alVX  +  XlX)  f(y|p) 

[  Jy 

dy 

dn(p), 

where  f(yj^)  denotes  the  integrand  in  (3.2).  Now, 
supposing 


J  (|2alVX|  +  XlX)  f  ( y  |  m  )  dy  <  <»  (3.4) 

Jy 


we  may  apply  Fublni's  Theorem  (see,  for  example,  Rudin 
(1974),  p.  150)  to  obtain 


r(^)  =  r  ( Y )  +  (-2a  VX  +  XX)  f(y)  dy 

J  V 


where  f(y)  is  given  by  (3.2).  Integrating  by  parts,  we  find 


f  alVX  f  ( y )  dy  =  I  (y-<5“)vX  f(y)  dy  ,  (3.5) 

Jy  Jy 


■  l 


the  boundary  terms  vanishing  as  a  consequence  of  the 
assumption  that  (2.4)  holds  for  each  X  .  Therefore 


r(i)  =  r  ( Y )  +  (-2(y-6B)lX  +  XlX]  f(y)  dy. 


Up  until  now  the  computations  have  been  performed  for 
any  X  which  satisfies  the  necessary  assumptions.  Taking  X= 
cDy  where  D  =  (I-M  ),  we  find 


z(v)  =  r(Y)  +  I  ( -2c  ( y-<5B  )lDy  +  czIIDyllz)  f(y)  dy.  (3.6) 


Denoting  the  dependence  of  ^  on  c  by  writing  r  (/w)=r  (/u(  c ) ) , 
we  find 


r  (/j(c-i-v')  )  -  r  (aj(c  )  ) 


=  J  * 

J  v 


[  2  ( c-c*  )  +v>]  II Dy  II 2  f  (y )  dy 


for  v^V'ty)  •  Consequently 


v.v.v. ’ v;. 


r(M(c  +Y>)  )  =  r  (p(c  )  ) 


5:  r  (r(c  )  ) 


+  V 

J  V 


IDy h  f ( y )  d(y)  (3.7) 


so  that  c=c  is  the  choice  of  c  minimizing  the  Bayes  risk. 

* 

In  order  for  c=c  to  be  the  unique  choice  of  c  (up  to  an 
equivalence  class  of  functions  whose  members  are  equal 
a.e.),  we  implicitly  assume  that  M  Dy  II 2  f  ( y )  >  0  a.e.  with 
respect  to  Lebesgue  measure  over  the  region  of  integration. 


Theorem  2.5  of  Haff  and  Johnson  (1986a),  used  in  the 
proof  of  Theorem  3.2,  is  an  easy  extension  of  Theorem  2.1. 

In  the  sequel,  we  will  let  /j  denote  the  estimate  p 
with  c=c  .  Substitution  of  <5  into  v  in  the  previous 
Examples  3.1  and  3.2  is  easily  accomplished. 


Example  3.1  (continued ) :  In  this  case,  we  have 


m  =  y 


t2) 


(y -r)  (I-M  )y 


II  ( I -M  )y«‘ 

n 


(I-MJy 


Example  3.2  (continued):  Here 


=  y  + 


y  Q( I-M  )y 


(I-M  )y H ' 

r* 


(I-M  )y 


where  Q  is  the  diagonal  matrix,  Q=d iag ( ( r 4  +  2 ) / ( «t-l ) ,  .  .  ., 


(r  +2)/(«  -1) ) . 

n  n 


Note  that  if  <5B=  y  -  g(y)(I-Mn)y,  then  fu*=SB.  In  other 
words,  if  the  Bayes  estimate  lies  on  the  line  t  (recall 
Figure  3.1),  then  the  restricted  Bayes  estimate  is  equal  to 
the  Bayes  estimate. 

Using  (3.7)  of  Theorem  3.2,  we  may  state  the  amount  of 
improvement  in  Bayes  risk  of  the  estimate  ^  over  any  other 
estimate  on  t.  As  an  example,  we  state  the  improvement  over 
the  estimate  y  of  /j  in  the  following  corollary: 


Corollary  3.1:  The  Improvement  in  Bayes  risk  of  the 

estimate  (J*  over  the  estimate  y  of  yu  is 


[cV  II  ( I  -M  )yll2  f(y)  dy. 

n 


Proof:  Let  w  -  -c  in  (3.7)  so  that  the  left-hand  side  of 

this  equation  is  the  Bayes  risk  of  y.  ■ 


In  Theorem  3.2,  we  presented  the  optimal  Bayes  estimate 
of  the  form  /u=y-c(I-M  )y  where  c  is  a  function,  c=c(y): 
IR"-*!!?1.  In  Theorem  3.3,  we  present  the  optimal  Bayes 
estimate  in  the  event  that  we  restrict  c  to  being  a 
constant . 


’■C-.'V'/'/s'  -. 

*•  »  j'  V  ■ 


,  v  „*%  _"*ii  HV  ,,***  y**  . 


Theorem  3.3:  Given  the  setting  and  assumptions  of  Theorem 
3.2,  the  Bayes  estimate  of  the  form 

V  =  y  -  c ( I -M  )y, 

n 

where  c  is  a  constant,  is  given  with 

[  (y-<SB)l(I-M  )y  f  ( y )  dy 

Z _ I _  ( 

I  II  ( I  ~ M  )y  II 2  f  ( y )  dy 

Jy 

I  tr I A( I -M  ) )  f (y)  dy 
Jy _ ( 

(  H(I-Mn)yllz  f  ( y )  dy 

Jy 

where  A=dlag(a  (Y  ),.  .  ,,a  (Y  ))  is  a  diagonal  matrix  and 

11  r»  n 

tr  denotes  the  trace  operator. 

Proof:  Differentiating  (3.6)  with  respect  to  c,  we  obtain 

dr(pj(c))  f  B  t  f  z 

-  =  -2  (y-<5B)  Dy  f (y )dy  +  2c  HDy*  f(y)dy 

dc  Jy  Jy 

where  D=(I-M  ).  Noting  that  this  derivative  is  zero  for  c=c 

n 

as  given  in  the  integral  expression  of  (3.8),  and 
d2r(^(c))  r 

-  =  2  B Dy  11  f  (y  )dy  >  0, 

dcz  Jy 


EY  tr [ A( I -M  )  1 

n 

eymi-m  )  Y  h  2 

r» 


EY(Y-<5B)t  (I-M  )  Y 

—  r* 

C  =  C  =  _ 

EY II  ( I  -M  )  Y  H  2 

n 


we  see  that  (3.8)  holds.  That  (3.9)  holds  is  a  consequence 
of  rewriting  the  numerator  integral  of  (3.8)  by  using  (3.5) 
with  \=(I-M  )y.  ■ 

n 

Comparing  (3.3)  with  the  first  expression  for  c  in 
(3.8)  we  see  that  c  may  be  viewed  as  an  approximation  to  c* . 
In  particular,  taking  the  expected  value  of  the  numerator  of 
(3.3)  and  dividing  by  the  expected  value  of  the  denominator 
of  (3.3)  we  obtain  c.  The  expectations  here  are  with 
respect  to  the  marginal  density  of  Y  as  given  in  (3.2). 

In  the  sequel,  we  will  let  p  denote  the  estimate  p  with 
c=c.  Note  from  (3.8)  that  p=6B  when  6B=Y-c(I~M  )Y  for  some 

n 

constant  c. 

Example  3.1  (continued):  Using  (3.9)  we  find  c-  EY  trtAD]/ 
EY  HDyll2,  where  DM  I -M  )  and  A=diag(a  (Y  ),.  .  .,a  (Y  ))-aZ  I. 

n  11  r»  n 

It  follows  that 

a7'  tr  D 

£  _  -  , 

(o'Z  +  rz)  tr  D^D  +  II  Dr  II  2 

the  denominator  expectation  evaluated  by  using  Theorem  4.6.1 
on  p.  139  of  Graybill  (1976).  Finally 

o7  tr  D 

p  =  y  -  -  Dy 

( &z  +t z  )  tr  DlD  +  IID^II2 


where  D- ( I -M  ) . 


Example  3.2  (continued) :  In  this  examplr  c  is  undefined  as 
the  expectations  involved  do  not  exist.  This  is  a  result  of 
having  placed  an  improper  prior  on  ft. 


As  done  in  Corollary  3.1  for  p  ,  we  may  compute  the 
improvement  in  Bayes  risk  with  p. 


Corollary  3.2:  The  improvement  in  Bayes  risk  of  the 

estimate  p  over  the  estimate  y  of  p  is 


[cl2  II  ( I -M  )yll2  f  (y )  dy. 

I  n 

Jy 


Proof:  Substitute  c  for  c  in  (3.6). 


Because  the  class  of  estimates  M=y-c(I-M  )y,  where  c= 

r* 

c(y)  is  a  function  of  y  contains  that  in  which  c  is  a 

constant,  /u*  will  outperform  Jj  in  terms  of  Bayes  risk. 

*  - 

Also,  by  design,  both  p  and  m  outperform  ‘‘he  estimates  Y 
and  MY  in  terms  of  Bayes  risk.  To  summarize: 


r (p* )  <  zip)  <  min  j  r(Y),  r ( M^Y )  j  . 


We  can,  of  course,  look  at  estimates  of  the  form  (1.2) 
for  restrictions  on  c=c(y)  other  than  those  already  chosen. 
So  far  we  have  taken  a  look  at  the  two  extremes  of  such 
restrictions.  The  estimate  p*  resulted  in  having  placed 


no  restrictions  on  c,  and  the  estimate  ^  resulted  in  having 
restricted  c  to  being  a  constant.  In  the  theorem  which 

follows,  we  look  at  estimates  of  the  form  (1.2)  with  c(y)  = 
d  / !!  ( I  -  M  )yll2,  where  d  is  a  constant.  The  resulting  Bayes 

r» 

estimate  restricted  to  this  class  of  choices  of  (1.2)  will 
be  useful  later  in  understanding  estimates  which  have  good 
(ordinary)  risk. 


Theorem  3.4:  Gi''en  the  setting  and  assumptions  of  Theorem 
3.2  with  x  =  II  ( I  — M  )yli  Z(I-M  )y,  the  Bayes  estimate  of  the 

r>  r> 

form 


y 


d 

ll(I-M  )yH2 

n 


(I-MJy 


where  d  is  a  constant,  is  given  with 


d  =  d* 


Y  tr  AD 

E  - 

II DY  tl  Z 


2YlDlDADY  ' 
II  DY  II  * 


E 


Y 


II  DY  II 


(3.10) 


Y  r  ( Y-6B ) 1 DY 


E 


II  DY  II  ‘ 


EY  II  DY  II  2 


(3.11) 


where  A=diag(a  (Y  ),.  .  .,a  (Y  ))  is  a  diagonal  matrix,  D= 

11  nn 


( I ~M  ),  and  tr  denotes  the  trace  operator. 


Proof:  Simillar  to  the  proofs  of  Theorems  3.2  and  3.3, 

and  thus  is  omitted.  ■ 

Comment :  Note  that  if  we  remove  the  expectations  in  (3.11) 

*s 

the  estimate  aj  becomes  that  of  Theorem  3.2. 


In  the  special  case  that  Y  is  a  vector  of  independent 
N {fj  ,<?z)  random  variables,  we  have  A=o-zl  and  (3.10)  becomes 


Y 1 D 1 DDY 


a  tr  D  -  2 


(3.12) 


EY  It  DY II  2 


Since 


X  (D)  <  (DY)lD(DY)/IIDYIIZ  <  X  (D), 

nun  max 


where  D=(D  +  D  )/2,  we  may  write  upper  and  lower  bounds  for 
(3.12) .  Namely, 

oZ[  tr  D  -  2X  ( D )  1  <  d*  <  o-z  [  tr  D  -  2X  (D)).  (3.13) 

max  mi n 

*  * 

By  Theorem  4.2  of  the  next  section  p  dominates  Y  with  d 
equal  to  the  lower  bound  of  (3.13). 

Note  that  if  D  is  an  idempotent  matrix  (i.e.,  DZ=D), 
then  (3.12)  becomes 


d*=o'Z  [  tr  D  -  21 


(3.14) 


which,  by  Theorem  9.1.5  of  Graybill  (1983),  p.  300,  is  also 


=  a  (rank  D  -  2 ) . 


(3.14'  ) 


4.  DOMINANT  ESTIMATES 


in  this  section,  ve  digress  from  our  main  discussion 
regarding  estimates  of  the  form  of  (1.2)  to  present  two 
dominance  results.  Both  of  these  dominance  results 
generalize  work  done  by  Stein  (1973,  1981)  under  the 

assumption  of  normality.  The  first  dominance  result  states 
sufficient  conditions  on  the  marginal  density  (3.2)  under 
which  the  Bayes  estimate,  6  ,  dominates  the  estimate  Y  of  /j. 
This  result  was  proved  In  Haff  and  Johnson  (1986a).  The 
second  dominance  result  also  looks  at  estimates  which 
improve  upon  the  estimate  Y  of  iu.  It  was  derived 
Independently  by  Johnson  (In  an  unpublished  work),  and  Chou 
(1988).  Actually,  the  result  of  Johnson  Is  somewhat  more 
general;  compare  Theorem  3.1  of  Chou  (1988)  with  Theorem  4.2 
below. 

Before  stating  the  first  dominance  result,  we  present 
some  notation  to  be  used  throughout  this  section.  For  a 
vector  Y=  ( Y  ,.  .  .  ,Y  )l  of  independent  Pearson  random 

1  n 

variables,  let 

n 

g(y)  s  f(y)  n  av<yv)  (4.1) 

v.  =  i 

where  f(Y)  Is  given  by  (3.2)  and  the  a(y^)  are  given  by 
(2.2).  Also  let 

v  =  (  d/d b  ,  d/db  ,  .  .  .,  d/db  )\ 


Iffff  yygro  »nvf  gw  tm  yr  inrv*  wwv  v 


"rTT  *WW  7V^"W*  nr 


B  =  B  ( Y )  =  (bt(Yj,  .  .  .,  bJYJ  )  , 

where  the  b  =  b  (y  )  are  given  by  2.3.  With  this  notation, 

L  V  L 

we  may  write  the  Bayes  estimate  of  fu  more  simply.  In 

D 

particular,  we  may  rewrite  (3.1)  as  6  =  Y  +  log  g(Y). 

Finally,  let 


*Bh 


I 

i  =  i 


a2h 


db‘ 


b: 


K: 

tr 


Ic 


Theorem  4.1:  Suppose  that  Y  is  an  nxl  vector  of  independent 
Pearson  random  variables  satisfying  the  conditions  of 
Theorem  3.1.  Let  X(y)=(X  (y),.  •  •/*  (y))1  =  7  log  g(y), 

1  n  Jd 

where  g  is  defined  by  (4.1).  If  (2.4)  holds  componentwise 

Y  i 

for  h(y)-X  (y)  and  E  X  X  <  oo,  then 


R(<5B,p)  =  R(Y,m)  +  4  E  *7*  [g(Y)1/2]/  [  g  (  Y ) 1/2 ) 


(4.2) 


Consequently,  when  dealing  with  squared  error  loss,  6 
dominates  Y  as  an  estimate  of  m  If 


B 


V2  [ g ( Y ) 1/2 )  <  0. 


.B 


(4.3) 


Proof:  Noting  that  6  =  Y  +  70  log  g(Y)  =  Y  +  MY),  apply 

Theorem  2.5  of  Haff  and  Johnson  (1986a)  to  obtain 


R(6B,/j)  =  R(Y,m)  +  E  [2aVx  +  XlX] 


(c.f.  the  proof  of  Theorem  3.2).  Rewriting  the  expression 


28 


V  .V. 


.  -  -  .  .  .  Wjm  .  ,V  „%  •.  V,\ 


v  m  nr\  vwwi’  vr»i 


in  square  brackets,  we  have 


R (6  ,»)  =  R(Y,p)  +  4  E 


—  exp  [  1/2  J  V  dbv  J 

V 

exp  [  1/2  J  V  dbt  ] 


This  reduces  to  (4.2)  with  \=d  log  q/db^.  Finally,  when 
M.i)  holds,  the  above  expectation  is  negative  so  that 
R (6B,v)  <  R(Y,/j).  ■ 


We  now  state  our  second  dominance  result 


Theorem  4.2:  Suppose  Y  is  an  nxl  vector  of  independent 

Pearson  random  variables  having  finite  second  moments.  Let 

My)=c(y)DB(y ) ,  where  D  is  a  specified  nxn  matrix  of 

constants  and  c(y)  :(Rn-*DR1  remains  to  be  specified.  If  (2.4) 

Y  t 

holds  componentwise  for  h(y)-x.  (y)  and  E xx  <  «>,  then 

V  fJ 


fj  =  Y  -  c(Y)DB(Y)  dominates  Y 


as  an  estimate  of  /j  with  respect  to  squared  error  loss 
for  .  .  . 


(i)  Symmetric  D  when 

c  (y )  =  <Blf(tr  D )  I  -  2D)"1DZB>"1 

and  the  largest  eigenvalue  of  D,  X  (D),  is  less  than 

max 

(tr  D)/2. 


,  *»  <’  ■  ■  %*  \  ~  O  OV  -  -V  S'  -  'r\f-  ■  « 


s  ^ 


(ii)  Arbitrary  D  when 


c(y) 


(tr  D)  -  2X 


II DB  H ' 


and  X *  =  X*(D)  =  X  ((D+Dl)/2)  is  less  than  (tr  D)/2, 


Comment:  It  should  be  noted  that  the  dominant  estimates  y> 
in  the  above  theorem  are  not  of  the  form  (1.2)  unless  B(Y) 
is  a  scalar  multiple  of  Y.  This  only  happens  when  Y  is  a 
vector  of  normal  variates.  When  Y  is  a  vector  of  normal 

M  A 

variates,  note  that  y  with  c  in  case  (ii)  is  of  the  form  yt, 
the  restricted  Bayes  estimate,  given  in  Theorem  3.4  with 
D=( I-M  ) . 


Proof:  Note  that  c(y)  =  (BlNB)_1  with  a  symmetric  N  for  each 
of  the  two  choices  of  c  in  the  Theorem.  Applying  Theorem 
2.5  of  Haff  and  Johnson  (1986a)  and  using  the  symmetry  of  N, 
we  find 


AR  =  R(Y,m)  -  R(y,p) 

=  E  II Y-M II Z  -  E  MY  -  c  (  Y  )  DB  (  Y  )  )  -  /ull2 


_  -4BlDlNB  2 ( tr  D) 

-  hi  t  _  _ 


(BlNB)Z 


B 1  NB 


( B 1 NB  )  2 


(4.4) 


We  desire  to  show  AR>0  for  cases  (i)  and  (ii)  above. 


-/yvVvv.v />  V. 


Wl 


Case  (i):  Suppose  D=D  .  Simple  algebra  gives 


„  Bl(  2 [ ( tr  D ) I  -  2D ] N  -  D2  )B 
AR  =  E  _ _ 

( B 1 NB  )  2 

Taking  N  =  t  ( tr  D)I  -  2D]_iD  z/y,  ve  obtain 

AR  =  r(2-r)  E  IIDBII2/(BlNB)2  >  0 

for  0  <  y  <  2,  with  the  greatest  improvement  in  AR 
ing  with  y- 1.  This  completes  the  proof  of  case  (i). 


Case  (ii);  Taking  N  =  D lD/y,  (4.4)  becomes 


AR  =  E 

-4  y 

'  BlDlDlDB  ■ 

+ 

y  [ 2 ( tr  D)  -  y] 

ii  db  r 2 

II  DB  II 2 

II  DB  II 2 

But 


B  D  D  DB 


II  DB  H 


So,  assuming  y> 0 


-4  y 


AD  >  P 


max  z  Dz 
I  z  II  =  1 


max  zl  (  (  D+Dl )  /2  )  z 


I  z  H  =  l 


\  ( ( D+D  )  /2  ) 

mciK 


* 

\  . 


y  [ 2 ( tr  D)  -  y) 


coincid 


for  0  <  r  <  21 (tr  D)-2X*].  The  right-hand  side  is  maximized 
with  r=( tr  D)-2X*.  This  completes  the  proof  of  case  (11).  ■ 


Under  the  assumption  of  normality,  case  (i)  of  Theorem 
4.2  was  established  by  Stein  (1981;  p.  1142)  and  case  (ii) 
was  established  by  Li  and  Hwang  (1984;  proposition  1,  p. 
892)  . 

One  of  the  assumptions  of  Theorem  4.1  is  that  E  XlX  be 
finite.  We  give  sufficient  conditions  for  this  to  be  the 
case  in  the  following  result: 

Theorem  4.3:  The  quantity  E  \l\  =  E  c(Y)llDBllz  is  finite  in 
case  (i)  if  D  is  positive  definite.  It  is  finite  in  case 
(11)  if  D  is  of  full  rank. 


Proof:  With  c(y)  =  (BlNB) 


E  XlX 


E 


II DB II Z 
(BlNB)Z 


Case  (i);  Note  that  the  matrices  l(tr  D)I-2D]  1  and  Dz 
commute  and  are  symmetric.  Applying  Theorem  10.6.8,  p.  322, 
of  Mlrsky  (1972),  there  exists  an  orthogonal  matrix  P  such 


that 


Pl[  (tr  D )  I  -  2D]_1P  =  Q 

and 

pVp  =  R 

where  Q  and  R  are  diagonal  matrices.  Hence 

(PlB)lR(PlB) 

E  XX  =  E  - 

[ ( P 1 B ) lQR ( P 1 B ) ] * 

(max  r  ) 

l 

<  -  E  IIBII"2  . 

(min  q  r  )z 

V  V 

In  the  previous  expression,  the  r^  are  the  diagonal  entries 
of  R,  and  the  q^  are  the  diagonal  entries  of  Q. 
Since  D  is  positive  definite,  the  r  are  positive.  Also, 
X  (D)  <  (tr  D)/2  implies  that  the  q  are  positive.  Since 

mox  v 

X  (D)  <  (tr  D)/2  implies  n  >  3  (use  the  fact  that  the 

mat* 

trace  of  a  matrix  is  equal  to  the  sum  of  its  eigenvalues), 
it  suffices  to  show  that  E  II B N  2  is  finite  for  n  >  3  to 

complete  the  proof  of  case  (i). 

Case  (ii):  There  exists  a  matrix  P,  by  Theorem  10.3.4  of 
Hirsky  (1972),  such  that  PlDlDP  =  R,  where  R  is  a  diagonal 
matrix.  Consequently 

E  xlx  =  ((tr  D)-2X*]2  e  kdbiT2 

=  [(tr  D)-2X*]2  E  (  (PtB)tR(PtB)  )"* 

[(tr  D ) - 2X* ] 2 

=  -  E  RBI  . 

(min  r  ) 
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By  Theorem  12.22  of  Graybill  (1983),  D  of  full  rank  Insures 
that  none  of  the  non-negative  eigenvalues  r^  of  DlD  are 
zero.  Again,  since  X*  <  (tr  D)/2  implies  n  >  3,  it  suffices 
to  show  that  E  II B II  2  is  finite  for  n  >  3  to  complete  the 
proof  of  case  (11). 


i 


i 


I 

r 

f 

t 


i 

■ 

j 


4 


4 


So,  to  prove  the  theorem  in  its  entirety,  it  remains  to 
show  that  E  II B II  2  is  finite  for  n  >  3.  From  (1.2)  of  Haff 
and  Johnson  (1986a)  we  obtain 

E  IIBII"2  =  J  J  ll B ll ”2  f ( y | m )  dyt-  ■  •  dyn 
where  f(y|^)  is 

n 

n  pAtu.)  ajyj"1  exp(pv  f  a.  (y.  )"‘dy.  -  J  y.  a.  (y.  ) '  ‘dy. ) . 

i  =  1 

Noting  that  JdyVdb  |  =  a^(y^)  the  change  of  variables  x.= 
b  (y  )  gives 

t  V 

n 

E  HBir2  =  f  J"  Hxll"2  n  exp(M.  X  -v  (M  )  )k  (x  )  dx  •  •  •  dx  . 

■J  **  11  vtvvvv  1  n 

V  =  1 

Since  this  integral  is  bounded  over  the  region  HxH2  >  6  (by 
1/6),  it  remains  to  show  that  the  above  integral  is  finite 
over  the  region  llxll2  <  6  when  n  >  3.  Now  rewrite  the 

Integral  in  terms  of  the  polar  coordinates 

x  =  r  cos  9 

i  t 

x  =  r  sin  9  cos  9 

Z  12 
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X  = 

8 

r 

sin 

0 

1 

sin  6 

2 

• 

• 

cos 

e 

a 

X  = 

n-2 

r 

sin 

9 

l 

sin  9 

2 

■  • 

*  sin 

9 

n  -  a 

cos 

9 

n-2 

X  = 

n-  1 

r 

sin 

e 

i 

sin  9 

2 

•  • 

•  sin 

9 

n  -  a 

sin 

9  COS 

n-2 

9 

n-1 

X  = 

n 

r 

sin 

9 

*. 

sin  9 

a 

•  * 

•  sin 

9 

n  -  a 

sin 

9  sin 

n  —  a 

G 

n-i 

where  0  < 

e 

i 

<  7T 

for 

1=1,2, 

•  • 

.,n,  and 

0  < 

e 

n-i 

<  2n . 

The 

Jacobian, 

J, 

in 

this  case 

is 

J 

n— 

=  r 

n 

1 

n 

(sin  9 

V 

^n-i-l 

i  =  4 


Noting  that  llxllz=rz  the  transformed  integrand  becomes 
rn  *'1*  =  rn  3  times  a  function  bounded  in  the  sphere  rz  <  <5 

(the  are  bounded  in  the  sphere  if  ve  assume  a  continuous 
density  and  (2.4)  holds  componentwise  with  h(x)=l). 
Consequently,  E  II B H _z  is  finite  for  n  >  3.  This  completes 
the  proof.  ■ 

We  illustrate  Theorem  4.2  with  two  examples.  For  ease 
of  presentation,  we  choose  D=I  in  each.  With  this  selection 
of  D,  we  have  c  (y )  =  ( n-2  )  /  II B  Hz  in  both  case  (i)  and  case 
(Ji),  giving 

(n-2) 

M  =  Y - —  B  (4.5) 

KBH 


For  p  to  dominate  Y,  we  require  1  =  A  (I)  <  (tr  I ) / 2  =  n/2 

(i.e.,  n  2:  3).  As  a  third  example,  the  interested  reader 
may  wish  to  consult  section  5  of  Stein  (1981).  Here,  in  the 
normal  case,  Stein  considers  the  choice  of  the  weight  in  a 
three-term  symmetric  moving  average. 

Example  4.1  (James  and  Stein  (1961)):  Suppose  Y  is  an  n*l 
vector  of  independent  N (p^,crz)  random  variables.  From  Table 
2.2,  B=B ( Y )  =Y/o'Z .  Hence,  for  n  >  3, 

„  (n-2)aZ  (n-2)o'Z 

p  =  Y  -  -  Y  =  1  -  -  Y 

II Y II  2  II Y  II  2 

dominates  Y  as  an  estimate  of  p  with  respect  tw  squared 
error  loss.  Recalling  that  the  components  of  B  are 
determined  only  up  to  a  constant  (see  (2.3)  and  the 
discussion  which  follows),  we  may  generalize  the  above  by 
taking  B=B(Y)  =  (Y -v)/c/z,  where  v  is  any  specified  n*l 
vector  of  constants.  In  particular,  for  n  ^  3, 

(n-2 )oz 

M  =  Y  - - (Y-v)  (4.6) 

It  Y-v  ii  2 

dominates  Y.  From  (4.6)  we  see  that  the  estimate  (j  corrects 
the  estimate  Y  by  an  amount  -l  (n-2  ) o'2/ II Y-v II2]  ♦  ( Y-v)  .  For  n2: 
3,  the  ith  component  of  this  correction  term  is  negative  if 
Y  >v  ,  is  zero  if  Y  =v  ,  and  is  positive  if  Y  <v  . 

v  t  t  t  v  t 

Consequently,  we  may  view  the  estimate  (4.6)  componentwise 
as  modifying  the  estimate  by  moving  it  toward  (and,  in 


some  cases,  beyond)  v^.  In  practice,  this  estimate  performs 
best  when  v  Is  close  to  p  ,  1=1,.  .  ,,n. 

t  v 

Example  4.2  (Johnson  (1984)):  Suppose  Y  Is  an  nxl  vector  of 

independent  random  variables  whose  1th  component  has  a 

B(a  .ft  )  distribution  with  s  =a  +/?  known,  but  a  and  ft 

V  V  V  V  V  V  V 

unknown.  From  Table  2.2,  we  may  take  b  (y  ),  the  1th 

component  of  B,  to  be  s^  (In  IyV(l-yJ]  -  In  [v7(l-v  ))), 
where  v  is  any  constant,  0  <  v  <1.  With  this  choice  of 

t  t 

B,  (4.5)  dominates  Y  as  an  estimate  of  iu=(cx  /s  ,  .  .  .,«  /s  ) 

lion 

for  n  >  3.  As  in  the  previous  example,  v  may  be  thought  of 
as  modifying  the  estimate  Y^  by  moving  it  toward  v  . 


5. 


SUMMARY  AND  CONCLUSIONS 


in  this  report,  ve  have  presented  estimates  of  the  mean 
P  of  a  vector  Y  of  independent  Pearson  random  variables. 

The  Pearson  class  of  random  variables,  which  Includes 
several  well-known  variates  such  as  the  normal,  was 
introduced  in  Section  2.  With  the  notation  defined  in  (2.2) 
and  (2.3),  Tables  2.1  and  2.2  summarized  the  salient 
features  of  particular  Pearson  variates.  Throughout  the 
report,  theoretical  results  were  illustrated  by  a  variety  of 
different  Pearson  random  variables. 

In  Section  3,  we  examined  estimates  of  ^  of  the  form 

M  =  y  -  cDy  (5.1) 

where  D=(I-M  )y.  These  estimates  may  be  thought  of  as  a 

r> 

compromise  between  a  raw  data  estimate  y  of  \j,  and  a 
nonparametr lc  estimate  M^y  of  (j.  We  determined  the  choice 
of  c,  a  real-valued  function  of  y,  yielding  the  smallest 
Bayes  risk  for  iu.  Specifically,  this  choice  of  c  was  found 
to  be 

.  (y-6B)lDy 

c  =  c  =  -  (5.2) 

II  Dy  II z 

B 

where  <5  is  the  Bayes  estimate  of  This  was  derived  by 

both  geometric  and  analytic  arguments.  We  also  determined 


the  optimal  choice  of  c  when  c  was  assumed  to  be  of  a 
particular  functional  form.  If  c  (y  )=d/HDyll  ,  for  instance, 
then  d=d*  given  by  (3.10),  or  the  equivalent  (3.11),  yields 
the  best  performance  in  terms  of  Bayes  risk. 

Unfortunately,  we  were  unable  to  determine  c  for  which 
/j  dominates  Y  in  our  Pearson  setting  except  in  the  normal 
case.  We  hope  that  the  two  dominance  results  for  estimates 
not  of  the  form  (5.1)  (recall  Theorem  4.1  and  Theorem  4.2) 
will  aid  in  finding  such  a  c.  In  particular,  the  sufficient 

condition  (4.3)  given  for  6  to  dominate  Y  may  help 

B  ** 

establish  simple  conditions  on  6  in  (5.2)  so  that  fj 

* 

dominates  Y.  Also,  perhaps,  c  =  c  might  be  approximated  to 
yield  a  dominant  estimate 
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Appendix  A 

ERRATA  FOR  LI  AND  HWANG  (1984) 


In  this  appendix,  we  list  some  minor  errors  in  Li  and 
Hwang  (1984).  The  substance  of  their  results  are  unaffected 
by  these  corrections. 

Under  their  Theorem  1  (all  references  to  theorems  and 
equations  in  this  appendix  refer  to  Li  and  Hwang  (1984)), 
the  right-hand  side  of  the  equality  (2.9)  should  read 

(l  +  o  ( 1 )  ) n-1IIM  y  -  f  II2  +  o  ( En-1  II M  y  -  f  II2). 

p  n  n  p  n  n 


On  the  second  line  following  (2.13) 
(1+x)-1  <  1-x  for  x  >  0 
should  be  replaced  by 

(1  +  x)-1  >  1-x  for  x  >  0. 


with 


In  the  line  following  (2.14),  replace 

+  2(2n  1  +  3(n  *tr  M  )1/z)n  4 II A  y H z 

r»  n 

+  2  (  2n-1  +  3(n-1tr  M  2)1/2)n-illA  y  II2 . 

n  n 


The  line  below  (2.20),  we  read  "Finally  (2.17)  follows 
from  (2.16), (2.6)  and  (2.20)."  We  also  need  the  fourth 


moment  of  the  c  to  exist  here. 


!.VV 


On  the  right-hand  side  of  the  equality  (2.21),  replace 
o  (n  *tr  MZ  +  n  1 II A  f  II 2  +  n1llMy  -  f  II)2 

p  n  n  n  r»  n 

with 

o  (n-1tr  Mz  +  n’ 1 II A  f  II2  +  n  1 II M  y  -  f  II2). 

p  n  n  n  n  n 

Four  lines  below  (2.25),  the  inequality 
£  (n  itz  M  ) 2  +  mn”2tr  M2 

r»  n 

is  not  necessarily  true.  It  suffices  to  have 

<  (n  1tr  M  ) 2  +  2mn  ztr  Mz 

n  n 

instead . 

Two  lines  above  (2.26),  replace 

o(En_1HM  y  -  f  »z)=o  (n-1  II M  y  -  f  H2) 

n  n  p  n  r» 

with 

o  (En-1HM  y  -  f  «lz )  =o  (n_1HM  y  -  f  nz). 

p  n  n  p  r>  n 


On  the  second  li..e  from  the  end  of  the  proof  of  Theorem 
1,  on  page  891,  replace 

=  2  ( m  +  2)\(MZ)(tr  M2)_1(EHM  y  -  f  II2)2 

n  n  n  n 

with 


=  2  ( m  +  2  )  M  M2  )  ( tr  MZ  )  ^En  1 1»  M  y  -  f  II2)2. 

n  n  n**  n 
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Appendix  B 


DECISION  THEORY  TERMINOLOGY 


In  this  appendix,  we  review  some  standard  terminology 
used  in  decision  theory. 

Let  Y  -  (Y  ,  .  •  be  a  vector  of  independent 

random  variables  with  the  ith  coordinate  having  a  density  of 
f (y  Ip  ) .  We  understand  f(y  Ip.)  to  denote  a  family  of 

*  i  ■  i  t  ■  v 


densities  Indexed  by  the  parameter  p 


Also,  let  EY 


(EY  ,  .  .  . , EY  )l  =  p.  To  estimate  p  by  <p  =  <£(Y) 

i  n 

( <p  (Y),.  .  .  ,<p  ( Y )  ) 1  we  will  use  the  squared  error  loss 


function  L  where 


L(tf>,p)  =  (0-p)  (<£-p) 


(Y)  -  p  )  . 


The  expected  loss  or  risk,  R(0,p),  incurred  in 
estimating  p  by  <*>  is  then  given  by 


R(*,P)  s 


E"  L(*,p) 

r* 


=  f  L(0,p)f(y|p)  dy 


where  f(y|p)  =  f!f(Y  1^  *  and  dY  =  nd>V 


The  superscripts  of  the  expectation  symbol  E  denote  the 
random  variables  with  respect  to  which  we  are  taking  the 

expectation.  The  subscripts  of  E  denote  fixed  parameters. 
Such  superscripts  and  subscripts  are  suppressed  when  these 
are  clear  from  the  context. 

We  will  say  that  ^(Y)  dominates  ^(Y)  in  estimating  p 
with  respect  to  squared  error  loss  provided 

R  (<P±fp)  <  R  (02,M) 

for  all  p,  with  strict  inequality  for  some  p.  The  phrase 
with  respect  to  squared  error  loss,  3ince  it  is  understood, 
will  generally  be  suppressed  (we  use  parentheses  to  enclose 
such  phrases  in  what  follows)  Loss  functions  other  than 
squared  error  loss,  of  course,  could  be  used. 

In  the  event  there  does  not  exist  an  estimator  <p  =  <p(Y) 

*  * 

which  dominates  a  particular  estimator  4>  =  4>  (Y),  we  call 


<P  an  admissible  estimator 


admissible  is  Inadmissible . 


An  estimator  which  is  not 


One  basis  of  comparison  between  two  estimators  <p ■ 

Y)  and  <p •  -  cp^(  Y)  can  be  made  by  examining  how  large 
their  risks  may  become  as  we  vary  p.  In  particular,  we  may 


prefer  0  to  <P2  if 


sup  R{<p^,p)  <  sup  R  (4>z,p) 

P  P 

and  call  an  estimator  mlnlmax  if  it  minimizes  this  supremum. 


That  is,  <t>  is  minimax  if 

sup  R(<p*  ,p) 
P 


inf  sup  R(<f>,p)  . 
<P  P 


If  we  have  prior  information  about  ^  in  the  form  of  a 
probability  distribution  n(p)  for  p ,  then  estimators  may  be 
compared  on  the  basis  of  their  Bayes  risk.  The  Bayes  risk, 
r  =  r  (<p)  =  r(<p,n),  of  an  estimator  <p  is  given  by  a  weighted 
average  of  the  risk.  In  particular 


r  (<P)  -  r  ( <P 


*  L 


R  (4>,P)  dn(p) . 


Note  that  the  case  of  the  letter  r  distinguishes  whether  we 
are  dealing  with  the  (ordinary)  risk  or  Bayes  risk.  We  say 

41  41 

4>  =  <p  (Y)  is  a  Bayes  estimate  of  p  (with  respect  to  the 

prior  distribution  n)  if 


r(<p  ,n)  =  min  r  (<P,n) 

4> 


(B.l) 


In  the  above  discussion  on  Bayes  estimates,  we  assume 
that  n(p)  Is  a  probability  distribution.  That  is,  we  assume 
X  d n(p)  =  1.  Yet,  even  when  /  dn(^)  =  oo  we  may  still  find 
a  solution  to  (B.l).  The  prior  in  this  case  is  called  an 
Improper  prior  and  the  resulting  estimate  in  called  a  formal 
Bayes  estimate. 

We  will,  at  times,  restrict  our  attention  to  a 
particular  class,  £,  of  estimates,  <p,  over  which  we  will 

4i 

take  the  above  minimum.  In  this  event,  <p  is  a  Bayes 
estimate  with  respect  to  the  class  S  (and  prior  distribution 
n).  Such  a  Bayes  estimate  may  be  spoken  of  as  a  restricted 
Bayes  estimate. 


IS.T' 


Appendix  C 


A  ONE-SIDED  CHEBYSHEV  INEQUALITY 
WHEN  THE  FIRST  FOUR  MOMENTS  ARE  KNOWN 


Below,  we  recall  a  Theorem  of  Bhattacharyya  (1987) 
which  gives  a  bound  for  the  tail  probability  of  a  random 
variable  whose  first  four  moments  are  known. 


Theorem  C.l  (Bhattacharyya  (1987)):  Let  X  be  a  random 
variable  with  mean  and  let  oz ,  be  the  second, 
third,  and  fourth  central  moments,  respectively.  Also  let 
s=p3/o'3  and  For  every  non-negative  t  satisfying 
t2-st-l>0 


k-sZ-l 

P(  X~(J  >  tor)  <  - - - - -  .  (C.l) 

( k-s  -1 ) ( 1  +  t2 )  +  (t2-st-l)2 


For  a  Pearson  random  variable  with  parameters  O,  ft^, 

ft  ,  and  ft  ,  we  have 
l'  z' 

V  =  EX  =  ) / ( l-2ft  ) 

a2  =  E(X-m)2  =  Q ( M ) / ( 1-3 ft  ) 

a  E(X-^)3  =  2Qf  (aOQ(/u)/I  (1-3^)  (1-4^)  J 
a  E(X-m)*  =  3Q(^)  (2Q' (^)Z+(l-4^z)Q(/u)  )/(  (l-3/?z)  (l-4r?2)  (l-5^z)  ] 

where  Q (p)  =  ft  +  ft  ^  +  ft  ^ ,  provided  these  moments  exist 

O  1  2 

and  Theorem  2.1  holds  for  h(x)=xn,  n=0,l,2,3. 


Example  C.l:  If  X  Is  ),  then  we  find,  by  using  Table 

2.2  and  the  above  moment  relations  that  u  -0  and  u  =3cr4’ . 

9  * 

Hence,  inequality  (C.l)  holds  for  t  >  1,  with  s=0  and  k=3. 


Example  C.2:  If  X  is  r(ot,/3),  then  we  find,  by  using  Table 

2  2 

2.2  and  the  above  moment  relations,  that  p-a/ft,  a  -o/ft  , 

9  4 

/lv  =7<x/ ft  ,  and  v  =3«(«+2)//5  .  Consequently,  the  inequality 


(C.l)  holds  for  t  >  fl  +  (<x+l)i''z]/<xyz . 


k  =  3  +  6/o*. 


With  S-2/a 


*.  *V"\  %  T  K  •'«  1 


^  .'■/*  -*V 


Appendix  D 


BOUNDS  FOR  THE  VARIANCE  OF  A 
FUNCTION  OF  A  PEARSON  RANDOM  VARIABLE 

Klaassen  (1985)  presents  upper  and  lower  bounds  for  the 
variance  of  a  function,  G,  of  an  arbitrary  random  variable 
For  continuous  random  variables,  the  bounds  involve 

derivatives  of  G,  while  for  discrete  random  variables,  the 
bounds  involve  differences  of  G.  Klaassen's  result 

generalizes  the  result  established  by  Chernoff  (1981)  in  the 
case  where  the  random  variable  is  normally  distributed. 

In  this  appendix,  we  apply  the  work  of  Klaassen 
to  the  Pearson  class  of  densities. 


Theorem  D.l:  Let  X  be  a  Pearson  random  variable  on  (r,s) 
with  finite  variance  o-2  satisfying  (2.4)  with  h(x)=l.  Then 

[E  a(X)g(X))V  <  Var  G(X)  <  E  [a(X)g(X)2]  (D.l) 

where  g(X)=G' (X) . 

Proof:  Apply  Theorems  2.1  and  3.1  of  Klaassen  (1985),  with 
M=Lebesgue  measure,  x(x,y)=  l(bxJ(y)  -  l<xbj(y),  where  b= 
!•  EX=(©+0  )/<l-2/3  ),  h  ( x )  =  ( 1-2/?  ) ,  and  H(x)  =  (l-2f?  )x-(6+ft  ). 


z 

Example  D.l:  If  X  Is  normal  with  mean  h  and  variance  &  , 
then  a{x)-oZ  (see  Table  2.2),  and  (D.l)  becomes 

c?  E  (g(X)]2  <  var  G ( X )  <  &z  E  Eg(X)2]. 

Example  D.2:  If  X  is  a  beta  variate,  then  <yz = 

ot/3/[  (a+f?)z(a+f?+l)  ]  (see  Table  2.1)  and  a  (  x  )  =x  ( 1-x )  /  ( 

(see  Table  2.2)  so  that  (D.l)  becomes 

( «+/?+ 1 )  1 

-  E ( X ( l-X)g(X) ]  <  Var  G(X)  <  -  E  (X(l-X)g(X)  J. 

aft  (a+fi) 

We  now  apply  Klaassen's  result  to  discrete  Pearson 
random  variables.  These  random  variables  are  defined  on  p. 
83  of  Johnson  (1984).  Some  examples  appear  in  Table  D.l. 

Theorem  D.2:  Let  X  be  a  discrete  Pearson  random  variable  or. 
{N  ,  .  .  ..N  }  with  finite  variance  o? .  Then 

O  9  1 

(E  d(X)g(X)  )Z/az  <  Var  G(X)  <  E  [d(X)g(X)Z] 

where 

d ( x  )  -  a ( x )  -  ( x-p) , 
a(x)  =  {(?o  +  +  r*zxZ)  /  (l-2ftz) , 

P  =  EX  -  (e+^-D/d-2^), 
g  (  x  )  s  G  (  x  + 1 )  -  G  (  x  ) 

provided 

lim  a ( x ) f ( x )  =  0, 
x-*N 

for  1=0  when  N  =  -oo  and  for  i=l  when  N  =  oo. 


Proof:  Apply  Theorems  2.1  and  3.1  of  Klaassen  (1985)  with  jj 


^counting  measure,  *(x,y) 


1  (y) 

LIU),  x> 


l  (y) 


[v-lv])  1  (y)  where  [v]  denotes  the  integer  part  of  u 

and  i>=EX=(6+0  -l)/(l-20  ),  h ( x )  =  ( l-2f?z ) ,  and  k ( x)  =  ( 1-20  )  x- 
1).  ■ 


Table  D.l.  Some  discrete  Pearson  random  variables 


Name 


Poisson 


Probability  Distribution 


e"X  Xy 


,  y=o ,  1 , 2 ,  .  .  . 


(X, 0,1,0) 


Negat i ve 
Binomial 


( rT  ) v-'-1'"'1'- 


Discrete  t 
(Ord  (1968)) 


«  f]  l  (y+a  +  i )  +  b  ]  , 

1=0 

0  <  a  <  1,  0<bz<oo 
k  a  non-negative  Integer 


( ( r-1 ) p/q, 0, 1/g, 0 ) 
where  q=(l-p) 


>  •  •  f  1,0,1, •  •  • 


(  ( 1-k -  2a  )  / 2 ,  ( (a  +  k)2  +  bZ)/2(k  +  l),  [ 2 ( a  +  k ) +1 ] /2 ( k  +  1 ) ,  l/2tk+l)  ) 


